Enhance parallel foreign scan support#1571
Enhance parallel foreign scan support#1571MisterRaindrop wants to merge 1 commit intoapache:mainfrom
Conversation
add a new mock foreign data wrapper (FDW) for testing parallel foreign scans. Key changes include: - Implementation of `parallel_foreign_scan_test_fdw` to generate synthetic rows and support parallel scanning. - Modifications to the optimizer to generate gather paths for foreign tables with parallel capabilities. - Updates to `execMain.c` to enable parallel mode for gather nodes based on the execution context. - Addition of test cases to validate the functionality of the new FDW in both coordinator and all-segments modes.
avamingli
left a comment
There was a problem hiding this comment.
Overall, FDW parallel scan is a direction worth exploring, but this approach is too rough. The core problems are:
-
locus transition semantics for Gather in an MPP context haven't been thought through, and the changes are too broad.
-
FDW is a black box from the database's perspective.
For heap tables we have parallel scan (divide work by pages), for AO/AOCS we have parallel scan (divide work by files) — the work partitioning is well-defined.
But for FDWs, the parallel behavior depends entirely on the FDW's own implementation. If an FDW (say file_fdw) sets parallel_safe = true following planner's parallel logic but doesn't actually implement the DSM parallel callbacks (EstimateDSMForeignScan, InitializeDSMForeignScan, InitializeWorkerForeignScan), then multiple workers will each scan the full dataset, producing duplicate rows.
| /* Inherit locus from subpath — Gather collects within the same segment, | ||
| * data distribution across segments doesn't change. */ | ||
| pathnode->path.locus = subpath->locus; | ||
| pathnode->path.locus.parallel_workers = 0; /* Gather output is single-stream */ | ||
|
|
||
| pathnode->path.motionHazard = subpath->motionHazard; | ||
| pathnode->path.barrierHazard = subpath->barrierHazard; | ||
| pathnode->path.rescannable = false; | ||
| pathnode->path.sameslice_relids = subpath->sameslice_relids; | ||
|
|
There was a problem hiding this comment.
create_gather_path was guarded by Assert(false) for a reason — the locus semantics of Gather in an MPP context were never fully defined or implemented.
The comment is wrong, and the conclusion is wrong.
Gather does change the data distribution.
Simplest example: a Hash-distributed table's parallel partial scan has locus HashWorkers (data is hash-distributed across segments, and within each segment it's split across workers).
Gather collects all workers' data back to the leader process, so the locus should become Hash. You can't just copy the subpath locus and zero out parallel_workers — that's incorrect for HashWorkers , SegmentGeneral, and might be other locus types as well.
Also, this change is global — create_gather_path isn't FDW-specific. Once the Assert is gone, every code path that calls this function is affected. The hardest case is JOINs — mixing Gather with CBDB-style parallelism (Motion/slice) introduces a ton of problems. This PR doesn't seem to have considered any of that.
I'm not very familiar with Cloudberry. Still learning. FDW itself is a black box. Its specific implementation largely depends on how the user implements it. My understanding is that users need to take responsibility for their own implementations. Additionally, I should only enable gather for FDW. In other cases, it should remain false, this will parallel processing advantages of PostgreSQL? Additionally, I've looked into other aspects of FDW parallelism. Currently, it seems there is no optimal solution. So, should we aim to implement parallelism that is transparent to users? Or are there better approaches? Could you share some idea? |
Neither PostgreSQL nor Cloudberry supports parallel FDW scans, that's a deliberate decision, not an oversight. On the implementation side: having the kernel generate partial paths for FDW will cause FDWs that don't implement parallel scan callbacks to silently produce wrong results (e.g. duplicate rows). That's a kernel bug, not a user error — we can't shift that responsibility to FDW authors. And mixing Gather with CBDB-style parallelism remains fundamentally broken — the locus handling is wrong, and none of the issues I raised (joins, locus transitions, the overly broad execMain.c change) have been addressed. More importantly, before discussing how, we need to answer why. What real-world problem does this solve in an MPP system where FDW is already used across segments? And given the risks I mentioned above — broken locus transitions, silent wrong results for existing FDWs, untested join/subquery interactions — even if it can be done, is it worth the complexity? If you want to push this forward, you need to make the case clearly: what's the motivation, and convince us that all the issues raised have sound solutions. |
Parallel FDW primarily addresses the issue of slow data loading. This functionality was already implemented in earlier versions of PostgreSQL. Now, I am attempting to integrate this feature into MPP systems. In simple tests, parallelization has indeed delivered a performance improvement of one to two times. Such gains are essential for performance-sensitive business scenarios. Therefore, I am working to introduce this functionality. Alternatively, we could discuss the implementation plan in the issue tracker. |
|
Thank you for the detailed review comments. Regarding the core issues raised (the security impact of partial paths on FDWs that do not support parallel callbacks, the mixing of Gather and CBDB gang models, locus conversion, and the scope of changes in execMain.c), I agree that these are all issues that need to be addressed seriously. After reconsideration, I am inclined to withdraw the kernel-side modifications and adopt a pure FDW-layer solution instead. The core idea is:
This solution requires no kernel modifications and will not affect other FDWs. I would like to confirm: Is this direction reasonable? Are the variables ParallelWorkerNumberOfSlice and TotalParallelWorkerNumberOfSlice stable and reliable under the current CBDB parallel framework? Or do you have a more recommended way for the FDW to perceive gang parallel information? |
|
Among the existing parallel frameworks in CBDB, after FDW registers a partial path via add_partial_path(), can the planner correctly trigger gang expansion and set ParallelWorkerNumberOfSlice? Or does it require additional kernel adaptation? |
Where exactly? Are you referring to this commit? |
That sounds more reasonable. |
Yes, they are stable and reliable under the current CBDB parallel framework. But I'm not sure how you plan to use them.
I'm not entirely sure I follow — isn't this essentially how MPP PXF works today? except |
Yes, essentially, it reuses the existing MPP round-robin sharding mechanism of PXF—by modifying the segment ID/count in the HTTP header, PXF can distribute data to N×W gang workers instead of N physical segments. No changes are required on the PXF server side. Regarding ParallelWorkerNumberOfSlice: From the assignment logic in parallel.c, workers on the same segment are assigned incrementally via DSM entry (0, 1, 2, ...), which should be unique. However, I want to confirm: In the CBDB parallel framework, is this value guaranteed to be unique within the same slice on the same segment? |
support parallel foreign scan support and add a new mock fdw for testing parallel foreign scans.
Key changes include:
parallel_foreign_scan_test_fdwto generate synthetic rows and support parallel scanning.execMain.cto enable parallel mode for gather nodes based on the execution context.What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions